Business Understanding

Business Objectives

Divvy, a bike-sharing service, provides ride data across different user groups (members vs. casual riders) and time periods. Understanding usage patterns, ride behaviors, and station demand is critical for optimizing operations, improving customer experience, and informing marketing strategies.

Although Divvy collects large amounts of trip data, it is not yet fully clear how user type, ride duration, and temporal patterns (day, hour, season) interact to influence ridership. Additionally, little is known about which stations are most popular and how these trends shift across months and seasons.

Problem Statement

How do ride behaviors differ between casual and member users across time (daily, weekly, seasonal) and space (stations/routes), and what patterns can be identified in ride duration, station popularity, and demand trends that could inform Divvy’s operational and marketing strategies?


Data Understanding

Data Collection

Data Description

Data structure

str(penguins)
## 'data.frame':    344 obs. of  8 variables:
##  $ species    : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island     : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_len   : num  39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_dep   : num  18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_len: int  181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass  : int  3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex        : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year       : int  2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

Basic summary

skim(penguins)
Data summary
Name penguins
Number of rows 344
Number of columns 8
_______________________
Column type frequency:
factor 3
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
species 0 1.00 FALSE 3 Ade: 152, Gen: 124, Chi: 68
island 0 1.00 FALSE 3 Bis: 168, Dre: 124, Tor: 52
sex 11 0.97 FALSE 2 mal: 168, fem: 165

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
bill_len 2 0.99 43.92 5.46 32.1 39.23 44.45 48.5 59.6 ▃▇▇▆▁
bill_dep 2 0.99 17.15 1.97 13.1 15.60 17.30 18.7 21.5 ▅▅▇▇▂
flipper_len 2 0.99 200.92 14.06 172.0 190.00 197.00 213.0 231.0 ▂▇▃▅▂
body_mass 2 0.99 4201.75 801.95 2700.0 3550.00 4050.00 4750.0 6300.0 ▃▇▆▃▂
year 0 1.00 2008.03 0.82 2007.0 2007.00 2008.00 2009.0 2009.0 ▇▁▇▁▇

Statistical summary

summary(penguins)
##       species          island       bill_len        bill_dep    
##  Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
##  Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
##  Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
##                                  Mean   :43.92   Mean   :17.15  
##                                  3rd Qu.:48.50   3rd Qu.:18.70  
##                                  Max.   :59.60   Max.   :21.50  
##                                  NA's   :2       NA's   :2      
##   flipper_len      body_mass        sex           year     
##  Min.   :172.0   Min.   :2700   female:165   Min.   :2007  
##  1st Qu.:190.0   1st Qu.:3550   male  :168   1st Qu.:2007  
##  Median :197.0   Median :4050   NA's  : 11   Median :2008  
##  Mean   :200.9   Mean   :4202                Mean   :2008  
##  3rd Qu.:213.0   3rd Qu.:4750                3rd Qu.:2009  
##  Max.   :231.0   Max.   :6300                Max.   :2009  
##  NA's   :2       NA's   :2

Lorem ipsum dolor sit amet, consectetur adipiscing elit. In faucibus sagittis auctor. Cras sed tincidunt lorem. Nam at rhoncus augue, ut ultrices purus. Nullam commodo ullamcorper sem quis accumsan. Nam sed est in justo vestibulum euismod eget viverra eros. Donec non lobortis turpis. Vestibulum ac odio vitae lacus pharetra egestas ut in ipsum. Nunc blandit massa odio, eu tristique mauris tempor a. Donec dignissim enim sit amet ullamcorper convallis.

Data Dictionary

Attributes Data Types Descriptions Constraints
species Two. Three. Four.
island Six. Seven. Eight.
bill_len Ten. Eleven. Twelve.
bill_dep Ten. Eleven. Twelve.
flipper_len Ten. Eleven. Twelve.
body_mass Ten. Eleven. Twelve.
sex Ten. Eleven. Twelve.
year Ten. Eleven. Twelve.

Nam at rhoncus augue, ut ultrices purus. Nullam commodo ullamcorper sem quis accumsan. Nam sed est in justo vestibulum euismod eget viverra eros. Donec non lobortis turpis. Vestibulum ac odio vitae lacus pharetra egestas ut in ipsum. Nunc blandit massa odio, eu tristique mauris tempor a. Donec dignissim enim sit amet ullamcorper convallis.


Data Preparation

Database & Schema Setup

Set database connections

Create a connection to the PostgreSQL database

# Read config
config <- read.ini("resources/db_config.ini")
db <- config$postgresql

# Safe database connection
tryCatch({
  con <- dbConnect(
    Postgres(),
    host = db$host,
    dbname = db$database,
    user = db$user,
    password = db$password,
    port = as.integer(db$port)
  )
}, error = function(e) {
  stop("Database connection failed: ", e$message)
})

Create schema and table

Create schema divvy and single partitioned table

# Ensure schema exists
dbExecute(con, "CREATE SCHEMA IF NOT EXISTS divvy;")
## [1] 0
# Create one big trips table
dbExecute(con, "
  CREATE TABLE IF NOT EXISTS divvy.trips (
    ride_id             TEXT PRIMARY KEY,
    rideable_type       TEXT,
    started_at          TIMESTAMP,
    ended_at            TIMESTAMP,
    start_station_name  TEXT,
    start_station_id    TEXT,
    end_station_name    TEXT,
    end_station_id      TEXT,
    start_lat           NUMERIC,
    start_lng           NUMERIC,
    end_lat             NUMERIC,
    end_lng             NUMERIC,
    member_casual       TEXT,
    trip_month          TEXT   -- extra column for month tracking
  );
")
## [1] 0

Import Data

Load all 12 monthly CSVs into divvy.trips

# Define months and year
months <- sprintf("%02d", 1:12)
year <- "2024"

for (m in months) {
  file_path <- paste0("resources/data/", year, m, "-divvy-tripdata.csv")
  month_name <- tolower(format(as.Date(paste(year, m, "01", sep = "-")), "%B"))
  
  tryCatch({
    # Read CSV
    df <- read_csv(file_path, show_col_types = FALSE)
    
    # Add trip_month column
    df$trip_month <- month_name
    
    # Append data directly with deduplication via ON CONFLICT
    dbWriteTable(
      con,
      Id(schema = "divvy", table = "trips"),
      df,
      append = TRUE,
      row.names = FALSE
    )
    
    # Add unique constraint if not already there (only runs once)
    dbExecute(con, "
      ALTER TABLE divvy.trips
      ADD CONSTRAINT IF NOT EXISTS trips_ride_id_unique UNIQUE (ride_id);
    ")
    
    message(paste(month_name, "data loaded successfully."))
    
  }, error = function(e) {
    message(paste("Error with", month_name, ":", e$message))
  })
}

Create a Master View

Combine all trips into a dynamic view

dbExecute(con, "
  CREATE OR REPLACE VIEW divvy.all_trips AS
  SELECT * FROM divvy.trips;
")
## [1] 0

Exploratory Data Analysis

dbGetQuery(con, "
            SELECT *
            FROM divvy.trips
            WHERE trip_month = 'december'
            LIMIT 20;
          ")
##  [1] ride_id            rideable_type      started_at         ended_at          
##  [5] start_station_name start_station_id   end_station_name   end_station_id    
##  [9] start_lat          start_lng          end_lat            end_lng           
## [13] member_casual      trip_month        
## <0 rows> (or 0-length row.names)

Lorem ipsum dolor sit amet, consectetur adipiscing elit. In faucibus sagittis auctor. Cras sed tincidunt lorem. Nam at rhoncus augue, ut ultrices purus. Nullam commodo ullamcorper sem quis accumsan. Nam sed est in justo vestibulum euismod eget viverra eros. Donec non lobortis turpis. Vestibulum ac odio vitae lacus pharetra egestas ut in ipsum. Nunc blandit massa odio, eu tristique mauris tempor a. Donec dignissim enim sit amet ullamcorper convallis.

Proin eget aliquam urna. In a vulputate orci, non ornare mauris. Maecenas sed nunc vel arcu feugiat viverra non id ligula. Vestibulum viverra metus ut ligula porttitor, a interdum ante elementum. Nulla facilisi. Nulla facilisi. Suspendisse sagittis cursus ante, ac pretium lorem volutpat vitae. Ut eu lacus in nulla sollicitudin tincidunt. Aliquam erat volutpat. Nullam sed mi lobortis, viverra metus vel, pellentesque mi. Sed eu magna pharetra, sodales erat in, maximus nisi. Sed gravida venenatis odio elementum condimentum. Suspendisse eget pulvinar nisl, ut interdum dui. Suspendisse dapibus nibh maximus, laoreet eros ac, accumsan elit. Ut finibus, elit eu sodales imperdiet, ligula nisl malesuada est, quis hendrerit mi enim nec est.


Modeling

Lorem ipsum dolor sit amet, consectetur adipiscing elit. In faucibus sagittis auctor. Cras sed tincidunt lorem. Nam at rhoncus augue, ut ultrices purus. Nullam commodo ullamcorper sem quis accumsan. Nam sed est in justo vestibulum euismod eget viverra eros. Donec non lobortis turpis. Vestibulum ac odio vitae lacus pharetra egestas ut in ipsum. Nunc blandit massa odio, eu tristique mauris tempor a. Donec dignissim enim sit amet ullamcorper convallis.

Proin eget aliquam urna. In a vulputate orci, non ornare mauris. Maecenas sed nunc vel arcu feugiat viverra non id ligula. Vestibulum viverra metus ut ligula porttitor, a interdum ante elementum. Nulla facilisi. Nulla facilisi. Suspendisse sagittis cursus ante, ac pretium lorem volutpat vitae. Ut eu lacus in nulla sollicitudin tincidunt. Aliquam erat volutpat. Nullam sed mi lobortis, viverra metus vel, pellentesque mi. Sed eu magna pharetra, sodales erat in, maximus nisi. Sed gravida venenatis odio elementum condimentum. Suspendisse eget pulvinar nisl, ut interdum dui. Suspendisse dapibus nibh maximus, laoreet eros ac, accumsan elit. Ut finibus, elit eu sodales imperdiet, ligula nisl malesuada est, quis hendrerit mi enim nec est.


Evaluation

Lorem ipsum dolor sit amet, consectetur adipiscing elit. In faucibus sagittis auctor. Cras sed tincidunt lorem. Nam at rhoncus augue, ut ultrices purus. Nullam commodo ullamcorper sem quis accumsan. Nam sed est in justo vestibulum euismod eget viverra eros. Donec non lobortis turpis. Vestibulum ac odio vitae lacus pharetra egestas ut in ipsum. Nunc blandit massa odio, eu tristique mauris tempor a. Donec dignissim enim sit amet ullamcorper convallis.

Proin eget aliquam urna. In a vulputate orci, non ornare mauris. Maecenas sed nunc vel arcu feugiat viverra non id ligula. Vestibulum viverra metus ut ligula porttitor, a interdum ante elementum. Nulla facilisi. Nulla facilisi. Suspendisse sagittis cursus ante, ac pretium lorem volutpat vitae. Ut eu lacus in nulla sollicitudin tincidunt. Aliquam erat volutpat. Nullam sed mi lobortis, viverra metus vel, pellentesque mi. Sed eu magna pharetra, sodales erat in, maximus nisi. Sed gravida venenatis odio elementum condimentum. Suspendisse eget pulvinar nisl, ut interdum dui. Suspendisse dapibus nibh maximus, laoreet eros ac, accumsan elit. Ut finibus, elit eu sodales imperdiet, ligula nisl malesuada est, quis hendrerit mi enim nec est.


Deployment

Nullam commodo ullamcorper sem quis accumsan. Nam sed est in justo vestibulum euismod eget viverra eros. Donec non lobortis turpis. Vestibulum ac odio vitae lacus pharetra egestas ut in ipsum. Nunc blandit massa odio, eu tristique mauris tempor a. Donec dignissim enim sit amet ullamcorper convallis.